Data Island

Jun 04, 2019

Geospatial Visualisations

Geospatial is related to or denoting data that is associated with a particular location, according to Oxford dictionaries.

For those who know my background in the creative field, it should be no surprise that I get ecstatic whenever it is time for visualizations in my data science projects. This semester I have had a whole subject on the matter, which have been such a joy. Today, while preparing for the exam I will go over one particular type of visualization; mappings.

There are a lot of guidelines one should follow to get this type of visualization right. To go over them I will use an example that I made, and explain what is correct and what is wrong with exactly this example. The map was made looking into the search term "Bullshit" inspired by this research article, and using data from Google.

In [20]:
# <!-- collapse=True -->
from IPython.core.display import display, HTML, Image
display(HTML(
"""<iframe title="[The World of Bullshit ]" aria-label="World choropleth map" id="datawrapper-chart-2K8W0" src="//datawrapper.dwcdn.net/2K8W0/1/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="554"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",function(a){if(void 0!==a.data["datawrapper-height"])for(var e in a.data["datawrapper-height"]){var t=document.getElementById("datawrapper-chart-"+e)||document.querySelector("iframe[src*='"+e+"']");t&&(t.style.height=a.data["datawrapper-height"][e]+"px")}})}();</script>"""))

The world does not look the way you think it does

This was maybe one of the biggest learning moments in the visual course on my part. There is not a perfect way to visualize the world flat, so in order to this, it is projected into a model of what it looks like ( it is round in real life, remember?). The most common one is called the Mercator projection, and that was the one used when I learned about geography in the 90s. This is a so-called conformal projection, which means it preserves continental shapes. It can be great for visualizing small-scale-maps, like your hometown or maybe country ( I live in Denmark, so here I guess it would do as good a job as any), but sucks for visualizing global areas. The Mercator, for instance, was developed originally for sea navigation in 1569 by Gerardus Mercator. You can think of it as if the world had been made into a cylinder, roll it out and there you have the map. The reason it worked so well for ships, was that you could draw a line, follow it and have a straight compass direction for where to sail. The reason it does not work well for visualizations is that the further away from the equator you get, the larger the areas appear. This makes eg Norway, Greenland, and North America look extremely huge compared to central Africa and Asia. Be aware, this is the most common map projection also used in visual tools, and it is absolutely something you should avoid when visualizing the world in your work. I love this youtube video from Vox explaining the problem.

In [19]:
# <!-- collapse=True -->
from IPython.core.display import display, HTML
display(HTML("""<iframe width="560" height="315" src="https://www.youtube.com/embed/kIID5FDi2JQ" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>"""
            ))

So, maps are models that can only offer trade offs. But what trade off should you go for?

In our course book "The Truthful Art" from 2016, Alberto Cairo has made a nice overview of project suggestions to use for different use cases (page 269). The projections he suggests to use for visualizing the whole world is the Molleweid, Hammer, Eckert IV or Robinson. For the map I have made using DataWrapper I think it is a Hammer projection. ( I could not find a source, to know it precisely, so if anyone reading this knows, please contact me on Twitter and I will correct it accordingly, thanks!) Both Hammer and Mollweid are examples of what is called equal-area projections. In contrast to the Mercator map, they preserve the sizes and are therefore more accurate in that matter, but at the same time shapes are distorted. Looking at The World of Bullshit, you can see that it looks stretched. For visualizing Asia, Europe, North America or Australia Cairo recommends using Lambert azimuthal or Albers equal-area projections. For South-America and Africa Molleweid, Sinusoidal or the Lambert azimuthal are good choices. There are other projections, that are neither conformal nor equal-area, but considered more like a compromise of both sides like the Goode Homolosine projection. I can not think of any situation where I would choose that over a Hammer map when presenting data, but that is just a matter of personal taste I guess. No map projections are perfect, there are only some that are more right than others.

There are maps and there are data maps

While maps in their original form as reference maps or technical maps were meant to show locations, data maps also called thematic maps show a lot of other things as well. Like The World of Bullshit, showing figuratively speaking data is just one example. The map is such a fun visual model to use, and it can be so enlightening to see all your data mapped out. The first time I used thematic mapping was after running a K-Means algorithm on some qualitative data from different countries. The clustering became so much more telling having them on a map, than just watching points in a scatterplot. While the code was done in Scale, the map was done in Tableau (and no, would map in Tableau is not good, it is as Mercator as you can get (look at Greenland vs Africa!).

In [18]:
# <!-- collapse=True -->
PATH = "/Users/Margrethe/Data_Science/blog/jupyter-blog/content/img/"
Image(filename = PATH + "img_map_clusters_v2.png", width=500, height=350)
Out[18]:

While the data used to define the colors in this map is categorical, having one color for each cluster.

The World of Bullshit is what is known as a choropleth map. These are maps that use shades of colors to define areas as countries, states, etc. based on data. The most challenging thing with these maps, which I totally messed up with my BS map, is making sure the colors, groups, and legends align. It might be until now you have not given it a second thought, but look at the number of legends I have, and then look at the number of colors in the map. It just does not add up! It could have helped to have a range, instead of the legends. It could also help to make a diverging color scheme, keeping a class for each color. I could have calculated the statistics for the classes, with the mean and standard deviations defining the colors. I could even use a gradient scale from 0 to 100. Point is, before doing choropleth maps, get an overview of all the traps and how to avoid falling in them. This article can help a lot, and I wish I had spent the time to read it before making a map myself. Also, DataWrapper is great, but be aware of traps like this.

Then there are colors

As with everything, colors help the maps be both interesting and beautiful. But colors can also mess up an else wise good model. In the literature, there are some common rules on how to use colors in visualizations, in addition to some that are specific to the world of thematic maps. While the first deserves a blog post on its own, I will try to run quickly through some of the most important once in my view.

Be color blind sensitive Approximately almost 5 % of the population are colorblind, so to be sure that the colors are not making the visualization only possible to read for the 95 percent rest, ColorBrewer can help you choose color schemes.

Think in LCH space (L stands for light, C stands for saturation and H stands for hue). Colors are subjective, but even more, colors are interpreted differently in our brains. Also be aware that light drives the human perception, over saturation. Colors eg. yellow, blue or red should not be assumed to be equal colors when used, yellow will never be dark, while blue is and red will. Also by all means: avoid the rainbow pallette. I love the talk Robert Simmon held on the subject in this video from the Open Viz Conference in 2014.

In [17]:
# <!-- collapse=True -->
display(HTML("""<iframe width="560" height="315" src="https://www.youtube.com/embed/DjJr8D4Bxjw" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>"""))

High numbers are dark, low are light This is actually also true drawing on a dark background. Want to know why that is, listen to this Data Stores podcast with Karen Schloss.

Think sequential, diverging or qualitative If you have data that goes from 0 to 100 like my Bullshit map, choose a palette with a linear change in lightness. If your data is divided between two poles, you should choose a palette consisting of two different hues melting into each other in a neutral color. For qualitative or categorical data the task is to separate areas into different categories, and you need to use different colors for that. You should choose colors that are as different from each other as possible

Keep a low number of different colors While some claims that 2 colors are all you need, my creativity can't handle that. I need more colors in my life. But research has shown that one should keep the number of colors to a minimum. In class, we learned to use less than five colors if we want the viz to be read rapidly, and use less than 12 colors to decode categorical values. Like Robert Simmons in the video above say, I would also rather recommend staying below 7 different colors in a visualization. Our brains would have a hard time differentiating between the color coding if there were 12 different categories trying to be showed at a map.

Qualitative palette A way to get more categories into a map is to group different color shades together. This is a beautiful example that also Simmons showed off from NASA Earth Observatory. Read more about the project here.

In [16]:
# <!-- collapse=True -->
Image(filename = PATH + "portland_etm_2001_lrg.jpg", width=300, height=400)
Out[16]:

Use maps when you need to show spatial arrangements

While I love mapping data, I have also learned not using it for every data point given with geolocation connected to it is not the way to go; sometimes a good ol' bar can do the job. One of our visual lecturers pointed out that we should always consider if spatial arrangement matter for the task at hand. Is it important to show where to tell the story of the data? He had this example of comparing inhabitants in the different parts of Denmark. It does not really give you anything to plot states using gradient colors when it is easier to compare it seeing numbers. He compared this do a map showing water rise with global warming, here the map made sense, we could see how which parts of the world that would be under water if the warming continues. It not only made sense, it told a story and made an impression (especially since Denmark was completely erased from the map). In my defense, the second map, with the clusters, is an example of when spatial arrangement matters; here it helps show patterns and a matter of familiarity. Countries that are known to have the same culture are clustered together ( the data was about the social perception among the population if I remember correctly ).

Another thing to consider when using a map is the quality of the data compared to the geolocation. One thing is that if the data is sampled, one should be aware of the problem with small populations, often resulting in extream values in sampling ( over- or under-representing the true population).

Jun 04, 2019

First Post

Welcome to my blog on Data Science

This is just a test post. I used this article as a reference when creating this blog. It is written usen Jupyter Notebooks. Bookmark me and come back for more later!

In [ ]: